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Abstract. For a large Hermitian matrix A E C NxN , it is often the case 
that the only affordable operation is matrix-vector multiplication. In such 
case, randomized method is a powerful way to estimate the spectral density 
(or density of states) of A. However, randomized methods developed so far 
for estimating spectral densities only extract information from different ran¬ 
dom vectors independently, and the accuracy is therefore inherently limited 
to 0(1/ V N v ) where N v is the number of random vectors. In this paper we 
demonstrate that the u O(l/y/N v ) barrier” can be overcome by taking advan¬ 
tage of the correlated information of random vectors when properly filtered 
by polynomials of A. Our method uses the fact that the estimation of the 
spectral density essentially requires the computation of the trace of a series 
of matrix functions that are numerically low rank. By repeatedly applying A 
to the same set of random vectors and taking different linear combination of 
the results, we can sweep through the entire spectrum of A by building such 
low rank decomposition at different parts of the spectrum. Under some as¬ 
sumptions, we demonstrate that a robust and efficient implementation of such 
spectrum sweeping method can compute the spectral density accurately with 
0(N 2 ) computational cost and O(N) memory cost. Numerical results indi¬ 
cate that the new method can significantly outperform existing randomized 
methods in terms of accuracy. As an application, we demonstrate a way to 
accurately compute a trace of a smooth matrix function, by carefully balancing 
the smoothness of the integrand and the regularized density of states using a 
deconvolution procedure. 


1. Introduction 


Given an Ax A Hermitian matrix A, the spectral density , also commonly referred 
to as the density of states (DOS), is formally defined as 



(i) 


Here 5 is the Dirac distribution commonly referred to as the Dirac S- “function” 
(see e.g. mii), and the A^’s are the eigenvalues of A, assumed here to be labeled 
non-decreasingly. 

The DOS is an important quantity in many physics problems, in particular in 
quantum physics, and a large volume of numerical methods were developed by 
physicists and chemists |H 0 [6l [3 0 0 [lUl El El El HH E] for this purpose. 
Besides being used as a qualitative visualization tool for understanding spectral 
characteristics of the matrix, the DOS can also be used as to quantitatively compute 


l 


2 


L. LIN 


the trace of a matrix function, as given in the formal formulation below 


( 2 ) 


N n OO 

Tr If (A)} = ^ /(A*) = IV / /(t)^(t) df. 

*=i • / -°° 


Here /(f) is a smooth function, and the formal integral in Eq. ([2]) should be inter¬ 
preted in the sense of distribution. 

If one had access to all the eigenvalues of A, the task of computing the DOS 
would become a trivial one. However, in many applications, the dimension of A 
is large. The computation of its entire spectrum is prohibitively expensive, and a 
procedure that relies entirely on multiplications of A with vectors is the only viable 
approach. Fortunately, in many applications A only has O(N) nonzero entries, and 
therefore the cost of matrix-vector multiplication, denoted by c ma t V ec, is O(N). In 
some other cases the matrix is a dense matrix but fast matrix-vector multiplication 
method still exists with 0(N\og p N) cost, where p is a integer that is not too 
large. This is the case when the matrix-vector multiplication can be carried out 
effectively with fast algorithms, such as the fast Fourier transform (FFT), the fast 
multipole method (FMM) [151. the hierarchical matrix HZ!, and the fast butterfly 
algorithm [18], to name a few. 

Rigorously speaking, the DOS is a distribution and cannot be directly approx¬ 
imated by smooth functions. In order to assess the accuracy of a given numerical 
scheme for estimating the DOS, the DOS must be properly regularized. The basic 
idea for estimating the DOS is to first expand the regularized DOS using simple 
functions such as polynomials. Then it can be shown that the estimation of the 
DOS can be obtained by computing the trace of a polynomial of A , which can then 
be estimated by repeatedly applying A to a set of random vectors. This proce¬ 
dure has been discovered more or less independently by statisticians m and by 
physicists and chemists M, and will be referred to as Hutchinson’s method in 
the following. In physics such method is often referred to as the kernel polynomial 
method (KPM) 10] with a few different variants. A recent review on the choice 
of regularization and different numerical methods for estimating the DOS is given 
in [20] . There are also a variety of randomized estimators that can be used in 
Hutchinson’s method, and the quality of different estimators is analyzed in [23]. 

Contribution. To the extent of our knowledge, all randomized methods so far 
for estimating the DOS are based on different variants of Hutchinson’s method. 
These methods estimate the DOS by averaging the information obtained from 
N v random vectors directly. The numerical error, when properly defined, decays 
asymptotically as (D( l/y/Nff). As a result, high accuracy is difficult to achieve: 
every extra digit of accuracy requires increasing the number of random vectors by 
100 fold. 

In this work, we demonstrate that the accuracy for estimating the regularized 
DOS can be significantly improved by making use of the correlated information 
obtained among different random vectors. We use the fact that each point of the 
DOS can be evaluated as the trace of a numerically low rank matrix, and such trace 
can be evaluated by repeatedly applying A to a small number of random vectors, 
and by taking certain linear combination of the resulting vectors. If different set of 
random vectors were needed for different points on the spectrum the method will 
be prohibitively expensive. However, we demonstrate that it is possible to use the 
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same set of random vectors to “sweep through” in principle the entire spectrum. 
Therefore we call our method a “spectrum sweeping method”. 

Our numerical results indicate that the spectrum sweeping method can signifi¬ 
cantly outperform Hutchinson type methods in terms of accuracy, as the number of 
random vectors N v becomes large. However, the computational cost and the stor¬ 
age cost can still be large when the DOS needs to be evaluated at a large number 
of points. Furthermore, the accuracy of the spectrum sweeping method may be 
compromised when the right number of randomized vectors is not known a priori. 
We develop a robust and efficient implementation of the spectrum sweeping method 
to overcome these two problems. Under certain assumption on the distribution of 
eigenvalues of the matrix A, and the cost of the matrix-vector multiplication is 
0(N ), we demonstrate that the computational cost of the new method scales as 
0(N 2 ) and the storage cost scales as O(N) for increasingly large matrix dimension 
N. We also demonstrate that the new method for evaluating the DOS can be useful 
for accurate trace estimation as in Eq. Q. 

Other related works. The spectrum sweeping method is not to be confused 
with another set of methods under the name of “spectrum slicing” methods [221G5J 
M 1221 IMl HU- The idea of the spectrum slicing methods is still to obtain a partial 
diagonalization of the matrix A. The main advantage of spectrum slicing methods 
is enhanced parallelism compared to conventional diagonalization methods. Due 
to the natural orthogonality of eigenvectors corresponding to distinct eigenvalues 
of a Hermitian matrix, the computational cost for each set of processors handling 
different parts of the spectrum can be reduced compared to direct diagonalization 
methods. However, the overall scaling for spectrum slicing methods is still 0(N 3 ) 
when a large number of eigenvalues and eigenvectors are to be computed. 

Notation. In linear algebra notation, a vector w £ C m is always treated as a 
column vector, and its conjugate transpose is denoted by w*. For a randomized 
matrix B £ p. g en t r y-wise expectation value is denoted by E[U] and its 

entry-wise variance is denoted by Var[H]. We call B £ C mxn a (real) random 
Gaussian matrix, if each entry of B is real, and follows independently the normal 
distribution Af(0, 1). In the case when n = 1, B is called a (real) random Gaussian 
vector. The imaginary unit is denoted by i. 

The paper is organized as follows. In section [2] we introduce the DOS estimation 
problem. We also demonstrate the Delta-Gauss-Chebyshev (DGC) method, which 
is a variant of the kernel polynomial method, to estimate the DOS. We develop in 
section [3] the spectrum sweeping method based on the randomized estimation of 
the trace of numerically low rank matrices, and demonstrate a robust and efficient 
implementation of the spectrum sweeping method in section [4] We show how the 
DOS estimation method can be used to effective compute the trace of a matrix 
function in section [5] Following the numerical results in section [6j we conclude and 
discuss the future work in section 0 

2. Density of state estimation for large matrices 

Without loss of generality, we shall assume that the spectrum of A is contained 
in the interval (—1,1). For a general matrix with spectrum contained in the interval 
(a, b ), we can apply a spectral transformation 

~ 2 A — (a + b)I 


b — a 
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The spectrum of A is contained in (—1,1), and all the discussion below can be 
applied to A. The fact that the spectral density </>(f) is defined in terms of 
Dirac 5-functions suggests that no smooth function can converge to the spectral 
density in the limit of high resolution, in the usual L p norms (p > 1) jJD). In 
order to compare different numerical approximations to the spectral density in a 
meaningful way, the DOS should be regularized. One simple method is to employ 
a Gaussian regularization 

N 

(3) <M*) = ^9<y{t- K) =Tr \g a (tl - A)\. 

2=1 


Here 

(4) 


9a(s) 


i 

- e 2 cr 2 

Ny/iircr 2 


is a Gaussian function. In the following our goal is to compute the smeared DOS 

0(T' 

The key of computing the DOS is the estimation of the trace of a matrix function 
without diagonalizing the matrix. To the extent of our knowledge, randomized 
methods developed so far are based on Hutchinson’s method or its variants mm- 
The following simple and yet useful theorem is a simple variant of Hutchinson’s 
method and explains how the method works. 


Theorem 1 (Hutchinson’s method). Let A £ C iVxJV be a Hermitian matrix, and 
w £ be a random Gaussian vector, then 

(5) EKdw] = Tr [A], Var [w* Aw] = 2^(ReA ii ) 2 . 


Proof. First 


E [w* Aw] = E 


Y1 w ' ,r .i A '.> 

ij 



Tr[H], 


Here we used that E[w] = 0, Epimr*] = I and w is a real vector. This follows 
directly from that each entry of w is independently distributed and follows the 
Gaussian distribution 1). 

Second, 


Var[w*Hw] =E [(w*dw) 2 


(Tr[A]) 2 ] = E 


^WiWjWkWiAijAki - (Tr[H]) 2 
ijkl 


— AuAkk + 2 y^(ReHjj) 2 — (^~~^ Ag) 2 — 2 y^(ReAjj) 2 . 

ik i=fcj i 


□ 


Using Theorem [lj if we choose W £ R NxN ” to be a random Gaussian matrix, 
then 

Tr (gAtl - A)} w Tr [W*g a (tl - A)W\. 

In practice in order to compute g a (A—tI)W, we can expand g a (A—tI ) into poly¬ 
nomials of A for each t , and then evaluate the trace of polynomial of A. A stable and 
efficient implementation can be obtained by using Chebyshev polynomials. Other 
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choices of polynomials such as Legendre polynomials can be constructed similarly. 
Using the Chebyshev polynomial, g a (tl — A) is approximated by a polynomial of 
degree M as 

M 

(6) g a (tl - A) &gM(tI- A) := ^ m(t)Ti(A). 

1=0 

The coefficients {m(t)} need to be evaluated for each t. Since the Dirac (5-function 
is regularized using a Gaussian function, following the notion in [20] we refer to 
Eq. ^ as the “Delta-Gauss-Chebysliev” (DGC) expansion. Since g a (t — •) is a 
smooth function on (—1,1), the coefficient gi(t) in the DGC expansion can be 
computed as 

(7) M*) = “—— [ , 1 ff a (f-s)T;(s)ds. 

7T 7-i VI - s 2 

Here Sio is the Kronecker <5 symbol. With change of coordinate s = cos 6, and use 
the fact that T)(s) = cos (l arccos(s)), we have 

(8) m(t) = — f g a (t — cos0) cos(W) dd. 

2tt J o 

Eq. ([8]) can be evaluated by discretizing the interval [0, 27 t] using a uniform grid, and 
the resulting quadrature can be efficiently computed by the Fast Fourier Transform 
(FFT). This procedure is given in Alg. [TJ and this procedure is usually inexpensive. 


Algorithm 1 Computing the Delta-Gauss-Chebyshev (DGC) polynomial expan¬ 
sion at a given point t. 

Input: Chebyshev polynomial degree M; 

Number of integration points 2 Ng, with Ng > M; 

Smooth function g a (t — ■)■ 

Output: Chebyshev expansion coefficients {ni{t)}f£. 0 . 

1: Let 0j = j = 0,..., 2Ng — 1. 

2; 9j = 9a(t- COSdj). 

3: Compute g — J~[g\, where F is the discrete Fourier transform. Specifically 

2JV ®- 1 . 2^1 
m= e 2Ne 9 j- 

3 = o 

4: W( 4 ) = Re 9h 1 = 0,..., M. 


Using the DGC expansion and Hutchinson’s method, can be approximated 
by 

M 1 M 

Mt) :=Tr [g™(tI-A)) « ^/q(t)— Tt\W^A)W] = 

1=0 v 1=0 

The resulting algorithm, referred to as the DGC algorithm in the following, is given 
in Alg. [2] 

We remark that the DGC algorithm can be viewed as a variant of the kernel 
polynomial method (KPM) t T0j • The difference is that KPM formally expands 
the Dirac (5-function, which is not a well defined function but only a distribution. 
Therefore the accuracy of KPM cannot be properly measured until regularization 
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Algorithm 2 The Delta-Gauss-Chebyshev (DGC) method for estimating the DOS. 

Input: Hermitian matrix A with eigenvalues between (—1,1); 

A set of points {ti}^£ 1 at which the DOS is to be evaluated; 

Polynomial degree M; Smearing parameter a; 

Number of random vectors N v . 

Output: Approximate DOS i- 

1: for each ti do 

2: Compute the coefficient {y,i{ti)}ff 0 f° r each ti,i = 1 ,... ,Nt using Alg. [l] 

3: end for 

4: Initialize Cfc = 0 for k = 0, ■ ■ ■ , M. 

5: Generate a random Gaussian matrix W € R JVxJV ”. 

6: Initialize the three term recurrence matrices V m , V p <— 0 G c^xiv,, 

7: for l = 0,... , M do 
8: Accumulate (i <- (i + y- Tr[W*V c \. 

9: V p ^{2- 5 W )AV C - V m V . 

10: Vrn — Vc, Vc — Vp. 

11: end for 

12: for i = 1,..., Nt do 

13: Compute 0 CT (C) «- J2?to 

14: end for 


is introduced [20]. On the other hand, DGC introduces a Gaussian regularization 
from the beginning, and the DGC expansion ([6]) is not a formal expansion, and its 
accuracy can be relatively easily analyzed. The proof of the accuracy of the DGC 
expansion can be obtained via the same techniques used in e.g. [551 US] !3D], and is 
given below for completeness. 

Let A: be a non-negative integer, and P& be the set of all polynomials of degrees 
less than or equal to k with real coefficients. For a real continuous function / on 
[—1,1], the best approximation error is defined as 


(9) 


Mf) 


min 

pe P fc 


jll/-p|loo 


max \f(x) — p(x)\ 


It is known that such best approximation error is achieved by Chebyshev polyno¬ 
mials |31| . Consider an ellipse in the complex plane C with foci in —1 and 1, and 
a > 1,6 > 0 be the half axes so that the vertices of the ellipse are a, —a, *6, —ib, 
respectively. Let the sum of the half axes be x = a + 6, then using the identity 
a 2 — b 2 = 1 we have 


a = 


x 2 + i 

2X 


b = 


X 2 ^ 1 

2x 


Thus the ellipse is determined only by %, and such ellipse is denoted by £ x . Then 
Bernstein’s theorem m is stated as follows. 


Theorem 2 (Bernstein). Let f(z) be analytic in £ x with \ > 1, and f(z) is a real 
valued function for real z. Then 


( 10 ) 


Ek{f) < 


2 M{ X ) 

x fe (x-1)’ 
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where 

(11) M(x) = sup \f(z)\. 

ze£ x 


Using Theorem [2] a quantitative description of the approximation properties for 
Gaussian functions is given in Theorem [3] 

Theorem 3. Let A £ C NxN a Hermitian matrix with spectrum in (—1,1). For 
any t€K, the error of a M-term DGC expansion ([h]) is 

(12) \Tr[g„(tI-A)]-Tr[gM(tI-A)}\ < ^(1 + C 2f x)- M , 

a 

where Ci,C 2 are constants independent of A as well as a,M,t. 


Proof. For any t £ M, er > 0, the Gaussian function g a (t — •) is analytic in any 
ellipse £ x with y > 1. then 


M(x)= sup \ga(t- (x + iy))\ < — sup e=a 

z=x+iyd£ x z—x+iyd£ x 


1 (x-i) 

< —e 8<>- a 
“ N 


For any a > 0, let 


(13) x = l + acr, 
then x ~ x — 2<rcr, and 

1 o 2 

(14) M( 1 + aa) < —e 2 . 

Then the error estimate follows from Theorem [2] that 

E M {g<r(t- •)) < (i + aa)~ M . 

Finally 

\Tv\g a (tI-A)]-Tr\g?(tI-A)]\ 

<N\\g <7 (tI-A)-g A /(tI-A)\\ 2 

=NE M {ga(t - •)) < — eTr(l + acr)~ M . 
aa 

2 

The theorem is then proved by defining C\ — , C 2 = a. Since a can be chosen 

to be any constant due to the analyticity of the Gaussian function in the complex 
plane, both C\ and C 2 are independent of A as well as <r, M, t. □ 


Theorem ^indicates that the error of the DGC algorithm is split into two parts: 
the error of the Chebyshev expansion (approximation error) and the error due to 
random sampling (sampling error). According to Theorem [3j it is sufficient to 
choose M to be 0(a~ 1 |log cr|) to ensure that the error of the Chebyshev expansion 
is negligible. Therefore the error of the DGC mainly comes from the sampling error, 
which decays slowly as 
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3. Spectrum sweeping method for estimating spectral densities 

In this section we present an alternative randomized algorithm called the spec¬ 
trum sweeping method for estimating spectral densities. Our numerical results in¬ 
dicate that the spectrum sweeping method can significantly outperform Hutchinson 
type methods in terms of accuracy, as the number of random vectors N v becomes 
large. The main tool is randomized methods for low rank matrix decomposition 
which is briefly introduced in section [3. 1[ The spectrum sweeping method is given 
in section |3.2| and its complexity is analyzed in section |3.3| 

3.1. Randomized method for low rank decomposition of a numerically 
low rank matrix. Consider a square matrix P £ C NxN , and denote by r the 
rank of P. If r <C N, then P is called a low rank matrix. Many matrices from 
scientific and engineering computations may not be exactly low rank but are close 
to be a low rank matrix. For such matrices, the concept of numerical rank or 
approximate rank can be introduced, defined by the closest matrix to P in the 
sense of the matrix 2-norm. More specifically, the numerical e-rank of a matrix P, 
denoted by r e with respect to the tolerance e > 0 is (see e.g. [32] section 5.4) 

r e = min{rank(<5) : Q £ C JVxAr , ||P - Q\\ 2 < e}. 

In the following discussion, we simply refer to a matrix P with numerical e-rank 
r e as a matrix with numerical rank r. For instance, this applies to the function 
g G (tI — A) with small a, since the value g a (t — A j) decays fast to 0 when A j is 
away from t, and the corresponding contribution to the rank of g a (tl — A) can be 
neglected up to e level. As an example, for the ModES3D_4 matrix to be detailed 
in section[6| if we set u = 0.01 and t = 1.0, then the values {g a (tl — A)} sorted in 
non-increasing order is given in Fig. [T] where each A is an eigenvalue of A. If we 
set e = 10^ 8 , then the e-rank of g a (tl — A) is 59, much smaller than the dimension 
of the matrix A which is 64000. 



Figure 1. For the ModES3D_4 matrix, the values g a (tl — A) 
sorted in non-increasing order plotted in log scale with a = 
0.01, t = 1.0. Only the first 128 values larger than 10 -16 are shown. 


For a numerically low rank matrix, its approximate singular value decomposition 
can be efficiently evaluated using randomized algorithms (see e.g. 021131123). The 
idea is briefly reviewed as below, though presented in a slightly non-standard way. 
Let P £ <C NxN be a square matrix with numerical rank r <C N, and W £ M. NxNv 
be a random Gaussian matrix. If N v is larger than r by a small constant, then 
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with high probability, P projected to the column space of PW is very close to P 
in matrix 2-norm. Similarly with high probability, P projected to the row space 
of W*P is very close to P in matrix 2-norm. In the case when P is a Hermitian 
matrix, only the matrix-vector multiplication PW is needed. 

In order to construct an approximate low rank decomposition of a Hermitian 
matrix P, let us denote by Z = PW, then an approximate low rank decomposition 
of P is given by 

(15) P « ZBZ*. 


The matrix B is to be determined and can be computed in several ways. One choice 
of B can be obtained by requiring Eq. (15) to hold when applying W* and W to 
the both sides of the equation, i.e. 


K w := W*PW = W*Z « {W*Z)B{W*Z)* = ( W*Z)B(W*Z ). 

In the last equality we used that ( W* Z ) is Hermitian. Hence one can choose 
(16) B = (W*Zy = K' w . 

Here K ^ is the Moore-Penrose pseudo-inverse (see e.g. [32j, section 5.5) of the 
matrix Kw- In Alg. [3] we summarize the algorithm for constructing such a low 
rank decomposition. 


Remark. In the case when Kw is singular, the pseudo-inverse should be handled 
with care. This will be discussed in section 3.2 Besides the choice in Eq. (16), 
another possible choice of the matrix B is given by requiring that Eq. (151 holds 
when applying Z* and Z to the both sides of the equation, i.e. 

Z*PZ « ( Z*Z)B{Z*Z ), 


and hence one can choose 
(17) 


B = {Z*Z)\Z*PZ)(Z*Zy. 


Note that Eq. (17) can also be derived from a minimization problem 

12 


min II ZBZ* - P| 

B 


F ' 


However, the evaluation of Z*PZ is slightly more difficult to compute, since the 
matrix-vector multiplication PZ needs to be further computed. In the discussion 
below we will adopt the choice of Eq. (16). 


Algorithm 3 Randomized low rank decomposition algorithm. 

Input: Hermitian matrix P £ £_ NxN with approximate rank r; 

Output: Approximate low rank decomposition P ta ZBZ*. 

1: Generate a random Gaussian matrix W £ R JVxiv '' where N v = r + c and c is a small 
constant. 

2: Compute Z -e- PW. 

3: Form B = {W*Z) ] . 











10 


L. LIN 


3.2. Spectrum sweeping method. The approximate low rank decomposition 
method can be used to estimate the DOS. Note that for each t, when the regular¬ 
ization parameter a is small enough, the column space of g a (tI—A) is approximately 
only spanned by eigenvectors of A corresponding to eigenvalues near t. Therefore 
for each t, g a {tl — A) is approximately a low rank matrix. Alg. [3] can be used to 
construct a low rank decomposition. Motivated from the DGC method, we can use 
the same random matrix W for all t. 


g a (tI-A)^Z(t)(W*Z(t)^Z*(t), 


and its trace can be accurately estimated as 


(18) 


Tr [g a (tl - A)] « Tr[(W*(Z*(t)Z(t))]. 


Here Z(t) = g„ (tl — A)W. We can use a Chebyshev expansion in Eq. ([6]) and 
compute g™(tI — A). 

The Chebyshev expansion requires the calculation of T[(A)W ., l = 0,..., M. 
Note that this does not mean that all Ti(A)W need to be stored for all l. Instead 
we only need to accumulate Z(t) for each point t that the DOS is to be evaluated. 
Ti(A) only need to be applied to one random W matrix, and we can sweep through 
the spectrum of A just via different linear combination of all Ti{A)W for each t. 
Therefore we refer to the algorithm a “spectrum sweeping” method. 

As remarked earlier, the pseudo-inverse should be handled with care. There are 
two difficulties associated with the evaluation of K\ v . First, it is difficult to know a 
priori the exact number of vectors N v that should be used at each t, and N v should 
be chosen to be large enough to achieve an accurate estimation of the DOS. Hence 
the columns of Z are likely to be nearly linearly dependent, and Kw becomes singu¬ 
lar. Second, although g a {tl — A) is by definition a positive semidehnite matrix, the 
finite term Chebyshev approximation g^f (tl — A) may not be positive semidehnite 
due to the oscillating tail of the Chebyshev polynomial. Fig. [2] (a) gives an example 
of such possible failure. The test matrix is the ModES3D_l matrix to be detailed 
in section [bj The parameters are o = 0.05, N v = 50. When computing the pseudo¬ 
inverse, all negative eigenvalues and positive eigenvalues with magnitude less than 
10- 7 times the largest eigenvalue of Kw are discarded. Fig. [2] (a) demonstrates 
that when a relatively small number of degrees of polynomials M = 400 is used, the 
treatment of the pseudo-inverse may have large error near t = 0.9. This happens 
mainly when the degrees of Chebyshev polynomials M is not large enough. Fig. [2] 
(b) shows that when M is increased to 800, the accuracy of the pseudo-inverse 
treatment is much improved. 

The possible difficulty of the direct evaluation of K\ v can be understood as 
follows. First, Theorem [2] suggests that for a strictly low rank matrix P, there is 
correspondence between the trace of the form in Eq. (181 and the solution of a 
generalized eigenvalue problem. 


Theorem 4. Let P £ <C NxN be a Hermitian matrix with rank r <C N and with 
eigen decomposition 

p = usu*. 

Here U £ C^xr an( ^ jj*jj = J. S £ R rxr is a real diagonal matrix containing the 
nonzero eigenvalues of P. For W £ C Nxp (p > r) and assume W*U is a matrix 
with linearly independent columns, then S can be recovered through the generalized 
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Figure 2. For the ModES3D_l matrix, compute the DOS using 
the spectrum sweeping method by computing the pseudo-inverse 
(“Pinv”) and by computing the generalized eigenvalue problem 
(“Eig”) using (a) a low degree Chebyshev polynomial M = 400 
(b) a high degree Chebyshev polynomial M = 800. 


eigenvalue problem 

(19) ( W*P 2 W)C = (W*PW)CE. 

Here C £ C pxr and S £ R rxr is a diagonal matrix with diagonal entries equal to 
those of S up to reordering. Furthermore, 

(20) Tr [( W*PW)\W*P 2 W)\ = Tr[S] = Tr [P]. 

Proof. Using the eigen decomposition of P, 

(' W*P 2 W)C = ( W*U)S 2 {U*WC ), ( W*PW)CE = ( W*U)S(U*WC)Z. 
Since W*U £ <C pxr is a matrix with linearly independent columns, we have 

S 2 (U*WC) = S(U*WC)E, 


or equivalently 


S{U*WC) = (U*WC) 5. 


Since S' is a diagonal matrix we have S 
To prove Eq. (20), note that 


up to reordering of diagonal entries. 


(w*pw)^ = ([/*fu) t s~\w*uy. 


Therefore 

Tr [(W*PW)*(W*P 2 W)\ = Tr S~ 1 (W*U)\W*U)S 2 (U*W)] 

= Tr[S] = Tr [USU*] = Tr[P], 


□ 
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Although Theorem [4] is stated for exactly low rank matrices, it provides an 
practical criterion for removing some of the large, spurious contribution to the 
DOS such as in Fig. [2] (a). For g a (tl — A) which is numerically low rank, only the 
generalized eigenvalues within the range of g , i.e. the interval [0, should 

be selected. This motivated the use of Alg. [3] to solve the generalized eigenvalue 
problem 

(21) Z*ZC = (W*Z)CE. 

where S is a diagonal matrix only containing the generalized eigenvalues in the in- 
terval [°> wb]- A1 §- [4] only keeps the generalized eigenvalues within the possible 
range of g a . Our numerical experience indicates that this procedure is more stable 
than the direct treatment of the pseudo-inverse. The algorithm of the spectrum 
sweeping method is given in Alg. [5] 

Now consider the problematic point when using the pseudo-inverse in Fig. [2] 
(a). We find that the generalized eigenvalues 2 at that problematic point has one 
generalized eigenvalue 0.033, exceeding the maximally allowed range j = 
0.008. After removing this generalized eigenpair, the error of the DOS obtained by 
solving the generalized eigenvalue problem becomes smaller. Again when the degree 
of the Chebyshev polynomial M increases sufficiently large to 800, the accuracy of 
the generalized eigenvalue formulation also improves, and the result obtained by 
using the pseudo-inverse and by using the generalized eigenvalue problem agrees 
with each other, as illustrated in Fig. [ 2 ] (b). In such case, all generalized eigenvalues 
fall into the range [0, _ ]. 


Algorithm 4 Solve the generalized eigenvalue problem for the spectrum sweeping 
method. 

Input: Matrices Z,W £ C Nx Nv ; 

Smearing parameter a; 

Truncation parameter r. 

Output: Generalized eigenvalues E and generalized eigenvectors C. 

1: Compute the eigenvalue decomposition of the matrix 

W*.Z = USU*. 

S is a diagonal matrix with diagonal entries {si}. 

2: Let 5 be a diagonal matrix with all eigenvalues Sj > rmaij .s,;, and U be the corre¬ 
sponding eigenvectors. 

3: Solve the standard eigenvalue problem 

S~^U*Z*ZUS~ix = XE. 

H is a diagonal matrix with diagonal entries {G}- 
4: Let E be a diagonal matrix containing the generalized eigenvalues £; £ [0, „ ], 

and X be the corresponding eigenvectors. 

5: Compute the generalized eigenvectors C = US~^X. 


The SS-DGC method can be significantly more accurate compared to the DGC 
method. This is because the spectrum sweeping method takes advantage that dif¬ 
ferent columns of Z = g a (tl — A)W are correlated: The information in different 
columns of Z saturates as N v increases beyond the numerical rank of g a (tI—A ), and 
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Algorithm 5 Spectrum sweeping method using the Delta-Gauss-Chebyshev ex¬ 
pansion (SS-DGC) for estimating the DOS. 

Input: Hermitian matrix A with eigenvalues between (—1,1); 

A set of points at which the DOS is to be evaluated; 

Polynomial degree M; Smearing parameter a; 

Number of random vectors N v . 

Output: Approximate DOS {4>a(ti)}. 

1: Compute the coefficient {g,i{ti)}fL 0 for each ti, i = 1 ,Nt using Alg. [TJwith f(x) = 

Sa(® - ti). 

2: Generate a random Gaussian matrix W £ M. NxNv . 

3: Initialize the three term recurrence matrices Vm, V p <— 0 £ , y yy 

4: Initialize Z{U) «- 0 £ C NxN \i = 1 
5: for l = 0,..., M do 
6: for i = 1,..., Nt do 

7: Z(ti) 4— Z(ti) + 

8: end for 

9 : Vp 4— (2 — Sio)AV c — Vm- 

10 : V m 4 — V c , V c i — Vp. 

11: end for 

12: for i = 1 ,..., Nt do 

13: Compute (j> a {ti) = Tr[E(t;)] where S(tj) is a diagonal matrix obtained by solving 

the generalized eigenvalue problem 

Z*(ti)Z{ti)C{ti) = W*(U)Z(U)C(U)E(ti) 

using Alg. [4] 

14: end for 


the columns of Z become increasingly linearly dependent. Comparatively Hutchin¬ 
son’s method neglects such correlated information, and the asymptotic convergence 
rate is only 0(N V 2 ). 


3.3. Complexity. The complexity of the DOS estimation is certainly problem 
dependent. In order to measure the asymptotic complexity of Alg. [5] for a series of 
matrices with increasing value of A, we consider a series of matrices are spectrally 
uniformly distributed, i.e. the spectral width of each matrix is bounded between 
(—1,1), and the number of eigenvalues in any interval (£i, ^ 2 ) is proportional to N. 
In other words, we do not consider the case when the eigenvalues can asymptotically 
be concentrated into one point. In the complexity analysis below, we neglect any 
contribution on the order of log A. In section [4] we will give a detailed example for 
which the assumption is approximately satisfied. 

Alg .[ 5 ] scales as 0(N%) with respect to the number of random vectors N v . There¬ 
fore it can be less efficient to let N v grow proportionally to the matrix size A. In¬ 
stead it is possible to choose N v to be a constant, and to choose the regularization 
parameter a to be 0(A -1 ) so that g a (tl — A) is a matrix of bounded numerical 
rank as A increases. Eq. (12) then states that the Chebyshev polynomial degree 
M should be O(N). We denote by c matV ec the cost of each matrix-vector multipli¬ 
cation (matvec). We assume c ma t vec ~ O(N). We also assume that N v is kept to 
be a constant and is omitted in the asymptotic complexity count with respect to 
A and A t . 







14 


L. LIN 


Under the assumption of spectrally uniformly distributed matrices, the compu¬ 
tational cost for applying the Chebyshev polynomial to the random matrix W is 
Cma,tvecMN v ~ 0(N 2 ). The cost for updating all {Z(ti)} is N t MNN v ~ 0(N 2 N t ). 
The cost for computing the DOS by trace estimation is 0(N t NN 2 + N t N 3 ) ~ 
0(N t N). So the total cost is 0(N 2 N t ). 

The memory cost is dominated by the storage of the matrices {Z(ti)}, which 
scales as 0(NN t N v ) ~ 0(NN t ). 

If the DOS is evaluated at a few points with N t being small, the spectrum sweep¬ 
ing method is very efficient. However, in some cases such as the trace estimation as 
discussed in section [5j N t should be chosen to be 0(N). Therefore the computa¬ 
tional cost of SS-DGC is 0(N 3 ) and the memory cost is 0(N 2 ). This is undesirable 
and can be improved as in the next section with a more efficient implementation. 


4. A ROBUST AND EFFICIENT IMPLEMENTATION OF THE SPECTRUM SWEEPING 

METHOD 

The SS-DGC method can give very accurate estimation of the DOS. However, 
it also has two notable disadvantages: 

(1) The SS-DGC method requires a rough estimate of the random vectors N v . 
If the number of random vectors N v is less than the numerical rank of 
g a (t,I — A), then the estimated DOS will has 0(1) error. 

(2) The SS-DGC method requires the formation of the Z(t) matrix for each 
point t. The computational cost scales as <D(N 2 N t ) and the memory cost 
is O(NNt). This is expensive if the number of points N t is large. 

In this section we provide a more robust and efficient implementation of the 
spectrum sweeping (RESS) method to overcome the two problems above. The 
main idea of the RESS-DGC method is to have 

(1) a hybrid strategy for robust estimation of the DOS in the case when the 
number of vectors N v is insufficient at some points with at least 0(l/y/N v ) 
accuracy. 

(2) a consistent method for directly computing of the matrix Z*(ti)Z(ti) for 
each point £, and avoiding the computation and storage of Z(ti). 


4.1. A robust and efficient implementation for estimating the trace of a 
numerically low rank matrix. Given a numerically low rank matrix P, we may 
apply Alg. i to compute its low rank approximation using a random matrix of size 
N x N v . Let us denote the residual by 


( 22 ) 


R :=P-ZBZ*. 


If N v is large enough, then ||P|| F should be very close to zero. Otherwise, Hutchin¬ 
son’s method can be used to estimate the trace of R as the correction for the trace 
of P. According to Theorem [T] if ZBZ* is relatively a good approximation to P, 
the variance for estimating Tr[P] can be significantly reduced. 

To do this, we use another set of random vectors W £ C JVxJV ”, and 


(23) Tr[P] 


1 Ty[W*RW] = ~ (rr[W*PW] - Tr[(WZ)B(Z*W)]\ 


Eq. (23) still requires the storage of Z. As explained above, storing Z can become 
expensive if we use the same set of random vectors W and W but evaluate the DOS 
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at a large number of points {ti}. In order to reduce such cost, note that 

Z*Z = W*P*PZ = W*{P 2 W). 


If we can compute both PW and P 2 W, then Z does not need to be explicitly 
stored. Instead we only need to store 


K w = W*{PW), K z = W*(P 2 W). 


This observation is used in section |4~2| where PW and P 2 W are computed sepa¬ 
rately with Chebyshev expansion. 

Similarly, for the computation of the correction term (23), we can directly com¬ 
pute the cross term due to W as 


K c = W*Z = W*(PW), K W = W*{PW). 

Alg. [3] involves the computation of the pseudo-inverse of K\y : which is in practice 
computed by solving a generalized eigenvalue problem using Alg. [4j The hybrid 
strategy in Eq. ( |23| ) is consistent with the generalized eigenvalue problem, in the 
sense that 


(24) 


E a [w*ZCC*Z*w\ = Tr [ZCC*Z*] = Ti[C*{Z*Z)C} = Tr[S]. 


The last equality of Eq. (24) follows from that 2, C solve the generalized eigenvalue 
problem as in Alg. [4] 

In summary, a robust and efficient randomized method for estimating the trace 
of a low rank matrix is given in Alg. [6] 


Algorithm 6 Robust and efficient randomized method for estimating the trace of 
a numerically low rank matrix. 

Input: Hermitian matrix P G c JVxJV - Number of randomized vectors N v , N v 


Output: Estimated Tr[P]. 


1: Generate random Gaussian matrices W G M. NxNv and W G C NxN ' 
2: Compute Kw = W*{PW), I< z = W*{P^W). 

3: Compute K c = W*(PW),K w = W*{PW). 

4: Solve the generalized eigenvalue problem 

K Z C = K W CE, 


using Alg. [4] 

5: Compute the trace 


Tr[P] « Tr[S] + i (Tr[A~] - Tr [K c CC*K* G ]\. 

iVu 


4.2. Robust and efficient implementation of the spectrum sweeping method. 

In order to combine Alg. [6] and Alg. [5] to obtain a robust and efficient implementa¬ 
tion of the spectrum sweeping method, it is necessary to evaluate (g a (tl — A)) 2 W, 
where 


(gAti-A )) 2 


1 (tr-A ) 2 

-e ^ 

TV 2 27rcr 2 
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In order to do this, one can directly compute (g a {tl — ^4)) 2 using an auxiliary 
Chebyshev expansion as follows 


M 


(25) 


(g a (tl - A)f 


1=0 


Proposition [5] states that if the expansion is chosen carefully, the Chebyshev ex¬ 
pansion ^ and (25) are fully consistent, i.e. if g a is expanded by a Chebyshev 
polynomial expansion of degree M/2 denoted by go^ 2 , we can expand {g a ) 2 us¬ 
ing a Chebyshev polynomial expansion of degree M denoted by gff 1 . These two 
expansions are consistent in the sense that (g^ 2 ) 2 = g^ 1 ■ 


Proposition 5. Let M be an even integer, and P is a matrix polynomial function 
of A 

M/2 

P=J2iaTi(A). 

1=0 

Then 


(26) 


M 

p 2 = j2"iW), 

1=0 


where 


vi 


2 — S l0 

7T 



1 

\/l — x 2 


Ti(x) 


/M/2 \ 2 

y gkTk(x) d.r, 

\k=o ) 




Proof. Using the definition of P, 

M/2 

P = 'y ' p p p q T p (A)T q (A). 

p,q=0 

Since 0 < p+q < M, P 2 is a polynomial of A up to degree M, and can be expanded 
using a Chebyshev polynomial of the form (26). □ 

The expansion coefficient ig ’s can be obtained using Alg. |T] with numerical inte¬ 
gration, and we have a consistent and efficient method for estimating the DOS. 


Theorem 6. Let A £ <C NxN b e a Hermitian matrix, and W £ M NxNv be a random 
Gaussian matrix. For any t £ (—1,1), let g^^ 2 (tl—A) be theM/2 degree Chebyshev 
expansion 

M/2 

(27) g^ 2 (tI-A)=J2Mt)T l (A), 

1=0 

and the coefficients {zg}££ 0 are defined according to PropositionDefine Z(t) = 
ga^~(tI — A)W, and 

(28) K w (t) = W*Z(t), K z (t) = W* ^ ui(t)T l (A)W^j . 
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Let w be a random Gaussian vector, and 

(29) K c {t)=w*Z(t), K w (t)=w*g^/ 2 (tI-A)w. 

Then 

(30) Tr [g¥‘'\tl - A)] = E s (k~( t) - K c {t)C{t)C*{t)K* c (t)) + Tr[H(t)]. 

Here C(t),E(t) solves the generalized eigenvalue problem 

(31) K z (t)C(t) = K w (t)C(t)3{t), 
using Alg. [7j 


(ho-’ 1 " 2 (tl — A)^ can be exactly computed using a Cheby- 


Proof. By Proposition 
shev of degree M with coefficients Therefore 


K z (t) = Z*(t)Z(t) = W*(j2 vi(t)Ti(A)W^ . 


\ 2=0 


The consistency between Hutchinson’s method and the estimation of Tr[S(t)] fol¬ 
lows from Eq. (241. □ 


Following Theorem [6j the RESS-DGC algorithm is given in Alg. [7] Compared 
to DGC and SS-DGC method, the RESS-DGC method introduces an additional 
parameter N v . In the numerical experiments, in order to carry out a fair in the 
sense the total number of random vectors used are the same, i.e. for RESS-DGC, we 
always choose N v +N v to be the same number of random vectors N® GG = TV® S ~ DGC 
used in DGC and SS-DGC, respectively. For instance, when N® GG is relatively 
small, in RESS-DGC we can choose N v = N v = ^N® GG , i.e. half of the random 
vectors are used for low rank approximation, and half for hybrid correction. When 
we are certain that N v is large enough, we can eliminate the usage of the hybrid 
correction and take N v = N GGC and N v = 0. 


4.3. Complexity. Following the same setup in section |4~3l we assume that a series 
of matrices are spectrally uniformly distributed. We assume the Chebyshev poly¬ 
nomial degree M ~ O(N), and c matV ec ~ O(N). In the complexity analysis below, 
we neglect any contribution on the order of log TV. We also assume that N v is kept 
as a constant and is omitted in the asymptotic complexity count with respect to TV 
and N t . 

In terms of the computational cost, the computational cost for applying the 
Chebyshev polynomial to the random matrix W is c matvec MN v ~ 0(N 2 ). The cost 
for updating Kw(ti) and Kz(U) is N t MN 2 ~ 0(NN t ). The cost for computing 
the DOS by trace estimation for each f, is 0(N t N, „) ~ 0(N t ). So the total cost is 
0(NN t + TV 2 ). 

In terms of the memory cost, the advantage of using Alg. [7] also becomes clear 
here. The matrices {Z(£,}} do not need to be computed or stored. Using the three- 
term recurrence for Chebyshev polynomials, the matrices Kz(t ), Kc(t ), K^{t) 

can be updated gradually, and the cost for storing these matrices are 0{N 2 N t ~ 
TV t ). The cost for storing the matrix W is 0(NN v ) ~ O(N). So the total storage 
cost is 0(N + TV t ). 

If we also assume N t ~ O(N) due to the vanishing choice of er, then the compu¬ 
tational cost scales as 0(N 2 ) and the storage cost scales as O(N). 
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Algorithm 7 Robust and efficient spectrum sweeping with Delta-Gauss-Chebyshev 
(RESS-DGC) method for estimating the DOS. 


Input: Hermitian matrix A with eigenvalues between (—1,1); 

A set of points {bjjJj at which the DOS is to be evaluated; 
Polynomial degree M ; 

Smearing parameter cr; 

Number of random vectors N v , N v . 

Output: Approximate DOS {c/> CT (fi)}. 


M 

1: Compute the coefficient {Atz(£i)};=o for each ti,i = 1,. 

and let gi(U) = 0, l = + 1,..., M. 

2: Compute the coefficient {i/i{ti)}ff 0 for each ti,i = 1,.. 


.. ,N t using Alg. [l] for Eq. (§, 
., N t using Alg. [l] for Eq. (251. 


3: Generate random Gaussian matrices W £ M JVxJV ” and W £ R NxNv . 

4: Initialize the three term recurrence matrices V m , V p «— 0 G C NxNv , V c t— W; V m , V p ■£- 

Q £ C NXN V 'y c yy 

5: Initialize matrices Kw{U), Kz{U) <— 0 € j _ i 3 ... ? TV*; Kc(ti) £- 0 £ 

c n„xiv„ and 0 £ 

6: for l = 0,..., M do 

7: Compute X w t- 1A*R C , X c <- W*V C , X w <- W*V C . 

8: for i = 1,..., Nt do 


9 

10 

11 

12 

13 

14 

15 

16 
17 


Accumulate Kw(U ) Kw(ti) + im(U) X w , K z (U) £- Kz{U) + vi(ti)X W - 
Accumulate Kc(U) £- K C {U) + im{U) X c , Kyy{U) «- Kyy{U) + m{U)Xyy. 

end for 

V p «- (2 - S l0 )AV c - Vm- 
Vm *- Vc Vc *- Vp. 

V p <r-(2-6 l0 )AV c -V m . 

Vm V c , Vc £~ V p . 

end for 

for i = 1 ,... ,Nt do 


18: Compute ) <— Tr g a 2 (til — A) 

Tr[H(£i)], where H (ti),C(ti) are computed from Alg. [ 4 ] 
19: end for 


A If [h' w (U) - Kc(U)C(U)C*(U)K* c (U)\ - 


5. Application to trace estimation of general matrix functions 

As an application for the accurate estimation of the DOS, we consider the prob¬ 
lem of estimating the trace of a smooth matrix function as in Eq. ([2|. In general 
fit) is not a localized function on the real axis, and Alg. [6] based on low rank 
decomposition cannot be directly used to estimate Tr[/(A)]. 

However, if we assume that there exists cr > 0 so that the Fourier transform 
of /(f) decays faster than the Fourier transform of g a {t), where g a is a Gaussian 
function, then we can find a smooth function /(f) satisfying 




f(s)g<r{t 


s) ds = /(f). 


(32) 


— OO 
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The function f{t) can be obtained via a deconvolution procedure. Formally 
1 


(33) 


N 


L 


Tr [f(A)} = / d t = 


— OO 
CO 


/ 7 

J —CO J —oo 


f(s)g a (t - dtds 


ds. 


Eq. (33) states that the trace of the matrix function f(A) can be accurately com¬ 


puted from the integral of f(s)(f> a (s), which is a now smooth function. Since the 
spectrum of A is assumed to be in the interval (—1,1), the integration range in 
Eq. (33) can be replaced by a finite interval. The integral can be evaluated accu¬ 


rately via a trapezoidal rule, and we only need the value of the DOS (j> a evaluated on 
the points requested by the quadrature. In such a way, we “transfer” the smooth¬ 
ness of f(t) to the regularized DOS without losing accuracy. 


The deconvolution procedure (32) can be performed via a Fourier transform. 


Assume that the eigenvalues of A are further contained in the interval (—a, a) C 
(—1,1) with 0 < a < 1. Then the Fourier transform requires that the function 
f(t) is a periodic function on a interval containing (—a, a), which is in general not 
satisfied in practice. However, note that the interval in Eq. © does not require the 
exact function f{t). In fact 

Tr[/(A)] = [ J7)<7) dt = j h(t)4>(t) d t 

J—a J—a 

for any smooth function h(t) satisfying 

(34) h(t) = f(t), t £ (—a, a). 

In particular, h.(t) can be extended to be a periodic function on the interval [—1,1]. 
In this work, we construct h{t) as follows. 


(35) 


Ht) = + 


/(-!) + /(!) 


(l-7T(f)). 


We remark that the constant in Eq- ( |35| is not important and can be 

in principle any real number. Here tt( t) is a function with 7r(f) = 1 for t £ (—a, a), 
and smoothly goes to 0 outside (—a, a). It is easy to verify that such choice of h(t) 
satisfies Eq. (34) and is periodic on (—1,1). There are many choice of 7r(f), and 
here we use 


(36) 


r(f) = 


erf 


ltd — 2 1 


— erf 


— 1 — a — 2t 


Here erf is the error function. If a is chosen to be small enough, then 7r(f) as defined 
in Eq. p6l satisfies the requirement. 
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Algorithm 8 Spectrum sweeping method for estimating the trace of a matrix 
function. 


Input: Hermitian matrix A with eigenvalues between (—a, a) where 0 < a < 1; 

Smooth function f(t); 

Smearing parameter a, a. 

Output: Estimated value of Tr[/(A)]. 


1 : Compute auxiliary function h(t) according to Eq. (351. 

2 : Compute {/(ti)}^i satisfying (/ * 5<r)(t) = h(t) on (—1,1) through the Fourier trans¬ 
form on a uniform set of points {f;}^ with spacing At = t 2 — ti. 

3: Use one of the algorithms to compute {()}. 

4: Compute Tr[/(A)] « NAtJ2?=i 


6. Numerical examples 

In this section we demonstrate the accuracy and efficiency of the SS-DGC and 
RESS-DGC methods for computing the spectral densities and for estimating the 
trace. For the asymptotic scaling of the method, we need a series of matrices 
that are approximately spectrally uniformly distributed. These are given by the 
ModES3DJX matrices to be detailed below. In order to demonstrate that the 
methods are also applicable to general matrices, we also give test results for two 
matrices obtained from the University of Florida matrix collection [56) . All the 
computation is performed on a single computational thread of an Intel i7 CPU 
processor with 64 gigabytes (GB) of memory using MATLAB. 

As a model problem, we consider a second order partial differential operator A 
in a three-dimensional (3D) cubic domain with periodic boundary conditions. For 
a smooth function u{x, y, z), Au is given by 

(Au)(x, y, z) = -Au(x,y,z) + V(x,y,z)u(x,y,z). 

The matrix A is obtained from a 7-point finite difference discretization of A. In 
order to create a series of matrices, first we consider one cubic domain SI = [0,L] 3 
and V(x, y, z) is taken to be a Gaussian function centered at (L/ 2, L/ 2, L/ 2) T . This 
is called a “unit cell”. The unit cell is then extended by n times along the x,y,z 
directions, respectively, and the resulting domain is [0,nL] 3 and V(x,y,z) is the 
linear combination of n 3 Gaussian functions. Such matrix can be interpreted as a 
model matrix for electronic structure calculation. Each dimension of the domain 
is uniformly discretized with grid spacing h, and the resulting matrix A is denoted 
by ModES3D_X where X is the total number of unit cells. Here we take L = 6 and 
h = 0.6. Some characteristics of the matrices, including the dimension, the smallest 
and the largest eigenvalue are given in Table [l] Fig. [3] (a), (b) shows the isosurface 
of one example of such potential for the matrix ModES3D_l, and ModES3D_8, 
respectively. Fig. [4] shows the DOS corresponding to low lying eigenvalues for the 
matrices with X = 1, 8 , 27,64 for a fixed regularization parameter a = 0.02, which 
indicate that the spectral densities is roughly uniform. 

6.1. Spectral densities. In order to compare with the accuracy of the DOS, the 
exact DOS is obtained by solving eigenvalues corresponding to the region of inter¬ 
est for computing the DOS. We use the locally optimal block preconditioned con¬ 
jugate gradient (LOBPCG) )3?J method. The LOBPCG method is advantageous 
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Matrix 

N 

min(ev) 

max(ev) 

ModES3D_l 

1000 

-2.22 

32.23 

ModES3D_8 

8000 

-2.71 

31.31 

ModES3D_27 

27000 

-2.75 

31.30 

ModES3D 64 

64000 

-2.76 

32.30 

pe3k 

9000 

8 x lO’ 6 

127.60 

shwater 

81920 

5.79 

20.30 


Table 1. Characteristics of the test matrices. 



(a) ModES3D_l (b) ModES3D_8 

Figure 3. Isosurface for V(x,y,z). 


ModES3D_l 


0.05 
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ModES3D_8 

0.12 

- 1 AA AAj 

0.1 

-2 ModE^3D_27 ° 

0.22 

0.21 

a a a AyM. 

0.2, 

5-2-1 0 

ModES3D_64 

0.32 

0.31 



-3 -2 -1 0 

t 


Figure 4. Shape of the DOS for a series of matrices with a = 0.02. 


for solving a large number of eigenvalues and eigenvectors since it can effectively 
take advantage of the BLAS3 operations by solving all eigenvectors simultaneously. 
The number of eigenvectors to be computed is set to be 35X for the test matrices 

























22 


L. LIN 


ModES3D_X with X = 1,8,27,64, respectively, and the highest eigenvalue ob¬ 
tained with such number of eigenvectors is slightly larger than 1.0. The tolerance 
of LOBPCG is set to be 10 -8 and the maximum number of iterations is set to be 
1000 . 

We measure the error of the approximate DOS using the relative L 1 error defined 
as follows. Denote by <j> a (ti) the approximate DOS evaluated at a series of uniformly 
distributed points U, and by (j) a (U) the exact DOS obtained from the eigenvalues. 
Then the error is defined as 


(37) 


Error = 


£i 




Ei 


For the DGC and SS-DGC method, an M -th order Chebyshev polynomial is 
used to evaluate Z(ti). For the RESS-DGC method, an M-th order Chebyshev 
polynomial should be used to expand Z*(ti)Z(ti). Following Theorem i only 
an M/2-th order Chebyshev polynomial can be used to evaluate Z(ti). Similarly 
when N v random vectors are used for DGC and SS-DGC, RESS-DGC is a hybrid 
method containing two terms. As discussed in section [4.2[ the number of random 
vectors used in DGC and SS-DGC are the same i.e. Ny C = Ny S ~ DGG . For 
RESS-DGC, we use iV EESS ~ DG = iiV EGG for the low rank approximation, and 
IV EESS “ DGG = t]V EGC for the hybrid correction. This setup makes sure that all 
methods perform exactly the same number of matrix-vector multiplications. 

Fig. [5] (a) shows the error of the three methods for the ModES3D_8 matrix when a 
very small number of random vectors N v = 40 is used, with increasing polynomial 
degrees M from 200 to 3200. Here we use a = 0.05. Note the relatively large 
polynomial degree is mainly due to the relatively large spectral radius compared to 
the desired resolution as in Table [l] This can be typical in practical applications. 
DGC is slightly more accurate for low degree of polynomials, but as M increases, 
RESS-DGC becomes more accurate. Fig. [ 5 ] (b) shows the error when a relatively 
large number of random vectors N v = 160 is used. When M is large enough, 
both SS-DGC and RESS-DGC can be significantly more accurate than DGC. SS- 
DGC is slightly more accurate here, because it uses the Chebyshev polynomials 
and random vectors more optimally than RESS-DGC, though the computational 
cost can be higher when spectral densities at a large number of points N t need to 
be evaluated. 

Fig. [6] (a) shows the comparison of the accuracy of three methods for a rela¬ 
tively low degree of polynomials M = 800 and with increasing number of random 
vectors N v . When N v is small, SS-DGC has 0(1) error, and this error is much 
suppressed in RESS-DGC thanks to the hybrid correction scheme. It is interesting 
to see that RESS-DGC outperforms DGC for all choices of N v , but its accuracy is 
eventually limited by the insufficient number of Chebyshev polynomials to expand 
the Gaussian function. SS-DGC is more accurate than RESS-DGC when N v is 
large enough. This is because in such case the low rank decomposition captures 
the correlation between the results obtained among different random vectors more 
efficiently. On the contrary, Hutchinson’s method for which DGC relies on only 
reduces the error only through direct Monte Carlo sampling. Fig. [6] (b) shows the 
case when a relatively large number of polynomials M = 2400 is used. Again for 
small N v , RESS-DGC reduces the large error compared to the SS-DGC method, 
while for large enough N v both SS-DGC and RESS-DGC can be very accurate. 
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N v = 40, a — 0.05 N„ = 160, a = 0.05 




Figure 5. For the ModES3D_8 matrix, the error of the DOS with 
respect to increasing polynomial degrees for (a) A small number 
of random vectors N v = 40 (b) A large number of random vectors 
N v = 160. 


M = 800,cr = 0.05 


M = 2400, o = 0.05 




(a) (b) 


Figure 6. For the ModES3D_8 matrix, the error of the DOS with 
respect to increasing random vectors for (a) A low polynomial de¬ 
gree M = 800 (b) A high polynomial degree M = 2400. 


In both SS-DGC and RESS-DGC methods, the parameter a is important since 
it determines both the degrees of Chebyshev polynomial to accurately expand g a , 
and the number of random vectors needed for accurate low rank approximation. 
Fig. [7] shows the error of DGC, SS-DGC and RESS-DGC with a varying from 0.02 
to 0.1. Here we choose M = 120/cr and N v = 3200er. This corresponds to the 
case when a relatively high degrees of Chebyshev polynomial and a relatively large 
number of random vectors are used in the previous discussion. We observe that 
the scaling M ~ 0(ct _ 1 ) and N v ~ O(a) is important for spectrum sweeping type 
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methods to be accurate, and SS-DGC and RESS-DGC can significantly outperform 
DGC type methods in terms of accuracy. 



Figure 7. For the ModES3D_8 matrix, the error of the DOS with 
respect to different choices of a. 


In order to study the weak scalability of the methods using the ModES3D_X 
matrices, as given in the complexity analysis, the polynomial degrees M should be 
chosen to be proportional to X. Here M = 300AT for X = 1, 8, 27,64, respectively. 
Correspondingly er = 0.4/A' and N t = 5AT. This allows us to use the same number 
of random vectors N v = 150 for all matrices. Fig. [8] shows the wall clock time 
of the three methods. Both SS-DGC and LOBPCG are asymptotically 0(N 3 ) 
methods with respect to the increase of the matrix size, and the cubic scaling 
becomes apparent from X = 27 to X = 64. RESS-DGC is only slightly more 
expensive than DGC and scales as 0(N 2 ). For ModES3D_64, the wall clock time 
for DGC, RESS-DGC, SS-DGC and LOBPCG is 2535, 3293, 11979, 49389 seconds, 
respectively. Here RESS-DGC is 15 times faster than LOBPCG and is the most 
effective method. Fig. [8] (b) shows the accuracy in terms of the relative L 1 error. 
Both SS-DGC and RESS-DGC can be significantly more accurate than DGC. We 
remark that since N v is large enough in all cases here, the efficiency of RESS-DGC 
can be further improved by noting that it only effectively uses half of the random 
vectors here, due to the small contribution from the other half of random vectors 
used for the hybrid correction. 


6.2. Trace estimation. As discussed in section [5j the accurate calculation of the 
regularized DOS can be used for trace estimation. To demonstrate this, we use the 
same ModES3D_X matrices, and let f(A) be the Fermi-Dirac function, i.e. 


Tr if (A)} = Tr 


1 


1 + exp(/l(A - fil) 

is to be computed. In electronic structure calculation, /r has the physical meaning 
of chemical potential, and (3 is the inverse temperature. The trace of the Fermi- 
Dirac distribution has the physical meaning of the number of electrons at chemical 
potential /i. Here /3 = 10.0,/r = —1.0. The value of a that can be used for the 
deconvolution procedure in Eq. (331 should be chosen such that after deconvolution 
/(s) is still a smooth function. Here we use a = 0.05 for X = 1, and a = 0.4/X for 
X = 8,27,64, respectively. The value of a for the smearing function in Eq. (36) is 
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Figure 8 . (a) Wall clock time of the DGC, SS-DGC and 
RESS-DGC methods compared with diagonalization method us¬ 
ing LOBPCG. (b) The error of the DOS computed at RESS-DGC 
and the DGC method. 


0.016. Correspondingly the polynomial degree M = 300X. The number of random 
vectors N v is kept to be 100 for all calculations. 

Fig. [9] shows the relative error of the trace for DGC, SS-DGC and RESS-DGC. 
We observe that SS-DGC and RESS-DGC can be significantly more accurate com¬ 
pared to DGC, due to the better use of the correlated information obtained among 
different random vectors. Again when N v is sufficiently large, SS-DGC is more 
accurate since the hybrid strategy in RESS-DGC is no longer needed here. 



Figure 9. Relative error for estimating the trace of Fermi-Dirac 
functions applied to ModES3D_X matrices. 
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6.3. Other matrices. In section [GTT[ we verified that both SS-DGC and RESS- 
DGC can obtain very accurate estimation of the DOS when the degrees of Cheby- 
shev polynomial and the number of random vectors are large enough, and RESS- 
DGC can lead to more efficient implementation. In this section we further verify 
that RESS-DGC can achieve high accuracy for other test matrices, when the degrees 
of polynomial and number of vectors N v is large enough. The test matrices pe3k 
and shwater are obtained from the University of Florida matrix collection [36] , and 
are used as test matrices in [20]. The character of the matrices is given in Table [l] 
Fig. 10 shows the DOS obtained from RESS-DGC for the pe3k matrix, compared 
to the DOS obtained by diagonalizing the matrix directly (“Exact”). The param¬ 
eters are a = 0.25 ,M = 4084, N v = 300 ,N t = 100. Since the goal is to demonstrate 
high accuracy, we turn off the hybrid mode in the RESS-DGC method by setting 


N v = 0. Fig. 10 shows that the error of RESS-DGC is less than 10 7 everywhere. 


The relatively large error occurs at the two peaks of the DOS, which agrees with the 
theoretical estimate that RESS-DGC needs more random vectors when the spec¬ 
tral density is large. Similarly Fig. m shows the same comparison for the shwater 
matrix, with er = 0.005,M = 16240, N v = 640, N v = 0, and N t = 100. More detailed 
comparison of the error and running time for the two matrices with increasing num¬ 
ber of random vectors N v is given in Tableland Table [3} respectively. We observe 
that the error of RESS-DGC rapidly decreases with respect to the increase of the 
number of random vectors, while the error of DGC only decreases marginally. We 
find that RESS-DGC only introduces marginally extra cost compared to the cost 
of the DGC method. 


0.014 
0.012 
0.01 
0.008 
0.006 
0.004 
0.002 

°4 

Figure 10. For the pe3k matrix, (a) numerically computed DOS 
by RESS-DGC, compared to the exact DOS, and (b) the error of 
the DOS computed by RESS-DGC. 



7. Conclusion and future work 

For large Hermitian matrices that the only affordable operation is matrix-vector 
multiplication, randomized algorithms can be an effective way for obtaining a rough 
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Figure 11. For the shwater matrix, (a) numerically computed 
DOS by RESS-DGC, compared to the exact DOS, and (b) the 
error of the DOS computed by RESS-DGC. 


Matrix 

N v 

Error RESS-DGC 

Error DGC 

pe3k 

100 

4.2 x 10" 2 

1.7 x 10~ 2 

pe3k 

200 

4.0 x 10~ 4 

1.3 x KT 2 

pe3k 

300 

4.8 x 10" 7 

1.1 x KT 2 

shwater 

320 

9.6 x 10 -3 

5.7 x KT 3 

shwater 

480 

1.2 x HT 4 

4.8 x 10' 3 

shwater 

640 

9.8 x 10“ 7 

3.7 x KT 3 


Table 2. Error of the DOS estimation for RESS-DGC and DGC 
with different numbers of random vectors N v . 


Matrix 

Ny 

Time RESS-DGC (s) 

Time DGC (s) 

pe3k 

100 

780 

764 

pe3k 

200 

1691 

1576 

pe3k 

300 

2825 

2720 

shwater 

320 

7371 

5513 

shwater 

480 

12310 

9487 

shwater 

640 

23479 

18495 


Table 3. Running time of the DOS estimation for RESS-DGC 
and DGC with different numbers of random vectors N v . 


estimate the DOS. However, so far randomized algorithms are based on Hutchin¬ 
son’s method, which does not use the correlated information among different ran¬ 
dom vectors. The accuracy is inherently limited to 0(1/\fN v ) where N v is the 
number of random vectors. 

We demonstrate that randomized low rank decomposition can be used as a dif¬ 
ferent mechanism to estimate the DOS. By properly taking into account the cor¬ 
related information among the random vectors, we develop a spectrum sweeping 
(SS) method that can sweep through the spectrum with a reasonably small number 
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of random vectors and the accuracy can be substantially improved compared to 
0(l/\/N v ). However, For spectrally uniformly distributed matrices with a large 
number of points to evaluate the DOS, the direct implementation of the spectrum 
sweeping method can have 0(N 3 ) complexity. We also present a robust and effi¬ 
cient implementation of the spectrum sweeping method (RESS). For spectrally uni¬ 
formly distributed matrices, the complexity for obtaining the DOS can be improved 
to 0(N 2 ), and the extra robustness comes from a hybridization with Hutchinson’s 
method for estimating the residual. 

We demonstrate how the regularized DOS can be used for estimating the trace of 
a smooth matrix function. This is based on a careful balance between the smooth¬ 
ness of the function and that of the DOS. Such balance is implemented through a 
deconvolution procedure. Numerical results indicate that this allows the accurate 
estimate of the trace with again 0(N 2 ) scaling. 

The current implementation of the spectrum sweeping method is based on Cheby- 
shev polynomials. Motivated from the discussion in WL Lanczos method would be 
more efficient than Chebyshev polynomials for estimating the DOS, and it would be 
interesting to extend the idea of spectrum sweeping to Lanczos method and compare 
with Chebyshev polynomials. We also remark that the spectrum sweeping method 
effectively builds a low rank decomposition near each point on the spectrum for 
which the DOS is to be evaluated. Combining the deconvolution procedure as in 
the trace estimation and the low rank decomposition could be potentially useful in 
some other applications to directly estimate the whole or part of a matrix function. 
For instance, in electronic structure calculation, the diagonal entries of the Fermi- 
Dirac function is needed to evaluate the electron density. These directions will be 
explored in the future. 
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